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data about points on the surfaces of objects, with the intent of selecting sensory points 
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the position and orientation of the object, and we derive analytic expressions for such 
error for the case of one particular approach to object recognition. 
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Problem Definition 


A robot often must recognize and locate objects in its workspace, or more informally, must 
use sensory information to determine what objects are where, in order to manipulate them. 
Since speed of operation is also an important consideration in robotics applications, the 
interaction of sensing and action should take place using a minimal amount of sensory 
data. This requires methods for optimally (or near optimally) selecting positions at 
which to obtain sensory data. Clearly, the notion of optimal selection of new data points 
will in part be tied to the specific recognition engine used to interpret those data points. 
In previous papers Gaston and Lozano-Perez 84; Grimson and Lozano-Perez 84, 85a, 
85b we have presented a constraint based recognition and localization technique that 
uses as input, sparse, noisy, occluded measurements of the position and orientation of 
small patches of an object’s surface obtained from any of several sensing modalities. 
Applying this recognition system to such sensory input data results in a small set of 
object poses, that is, a set of transformations taking a known object model from an 
intrinsic coordinate system into a coordinate system defined relative to the sensor. In 
this paper, we consider the problem of disambiguating from among this fixed set of object 
poses. Note that the set of poses could include poses corresponding to different objects. 
To disambiguate among a set of interpretations, we need to acquire sensory data that will 
clearly distinguish one pose of an object from another, using as few additional sensory 
points as possible. Thus, our problem is to optimally select places at which to obtain 
the needed sensory data. 

While we use the recognition system developed in [Grimson and Lozano-Perez 84, 
85a, 85b] as the basis for investigating sensing strategies for disambiguation, we expect 
that some of the results of this investigation should have application in more general 
situations of recognition and localization. To illustrate this, we begin with a set of 
examples of the use of sensing strategies. 


Example I: Disambiguating Multiple Interpretations 

Suppose we are given a sparse set of sensory data points, each recording the position 
and orientation of a small patch of some surface in the workspace of a robot. Our goal 
is to determine what objects, from a set of known objects, are consistent with this data, 
together with the pose (position and orientation) of the object that leads to such a 
consistent interpretation. In the case of sensory data known to all lie on one object, we 
take consistent to mean that a rigid transformation of the object will cause all of the data 
points to lie on the object, with the correct surface orientation (to within some known 
error bounds). In the case of sensory data that may come from more than one object, we 
take consistent to mean that a maximum subset of the data satisfies the above condition. 
In this case, of course, other interpretations of consistent are possible. 

In [Grimson and Lozano-Perez 84, 85a, 85b] we described an efficient constrained 
search technique for matching the sensory data to faces of an object model, in order 
to find the interpretations of the data. The sensory data consist of measurements of 
the position and surface orientation of small patches of object surfaces. The objects 
are modeled by sets of planar faces equations. The technique uses efficient constraints 
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between data elements and model elements to determine the set of interpretations of the 
data consistent with the model, that is the set of poses of the object that agree with the 
input data. Empirical testing, as well as theoretical analysis [Grimson 84], indicates that 
in general, there will be only one consistent interpretation of the data. It is possible, 
however, that more than one pose of the object will be consistent with the data, even for 
non-symmetric objects and even if the object is known (see Figure 1). To determine the 
correct pose, we will need additional sensing. 



Figure 1. Example of multiple interpretations. Given only a sparse set of isolated data points, multiple 
interpretations such as those indicated may be possible. 

The simplest method for obtaining the supplementary sensory data is to sample the 
object at random. If the sensing process is fast enough, and if, on average, only a few 
additional points are required in order to remove the ambiguity, then such a random 
sensing strategy could suffice. It is easy, however, to find situations in which a random 
sensing strategy would be ineffective in disambiguating between possible interpretations 
and in general one expects a random sensing strategy to have a very slow convergence. 
Moreover, some sensing modalities, for example, tactile sensing, are inherently sparse, 
and require considerable expense to obtain additional sensing points. In this case, it is 
particularly desirable to perform recognition with minimal sensory interaction. 
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Table 1 - Histograms of points needed for disambiguation. Each column indicates the number of sensory 
points needed to force a unique interpretation, and the number indicated in that column is the number 
of trials, out of 100, for which that number of data points was required. 


For example, Table I lists histograms of the number of additional, randomly chosen, 
sensing points needed to uniquely disambiguate several consistent interpretations. We 
generated an initial set of 9 points of data, all lying on a single object (shown in Fig¬ 
ure 1), and determined the set of consistent interpretations of that data, using the system 
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described in [Grimson and Lozano-Perez 85a, 85b], We then generated additional sense 
points until only one interpretation remained consistent with the data. This process 
was repeated for 100 trials, and the number of sense points needed to disambiguate the 
interpretations was histogrammed. The results are recorded in Table 1, where each entry 
is the number of trials terminating with the indicated number of sense points. The sen¬ 
sory data was generated by randomly choosing approach directions towards the object, 
and sensing for contact along them, much as might occur in tactile sensing. It can be 
seen that choosing sensing directions at random may have a slow convergence towards 
a unique interpretation, especially since in this case we are only dealing with the simple 
case of data from a single known object. 

In general, one would expect a tradeoff between random sensing strategies and fea¬ 
ture driven sensing strategies. Given two possible interpretations of the data, consider 
constructing the volume difference, consisting of all points contained in one but not both 
of the interpretations. If the size of this volume relative to the volume of the object is 
large, then in general, one would expect randomly generated additional sensing points to 
quickly disambiguate the situation. On the other hand, if the relative volume is small, 
one would expect that a large number of additional sense points would be needed before 
one of them struck this volume difference. In this case, a more directed sensing strategy 
is likely to be more effective. 


Example II: Localization with Minimal Sensing 


In the previous example, we discussed the problem of generating additional sensory data, 
given some initial set of data and the interpretations consistent with it. A related problem 
is to consider the optimal acquisition of all of the sensory data, rather than just that 
needed to disambiguate interpretations. For example, consider a situation in which a 
known object, with a fixed set of known stable positions is being sensed. This might 
be the case, for example, when considering objects in pallets, or feeders. We would like 
to determine the pose of the object with as few sensory points as possible. Here, the 
initial set. of interpretations is the set of stable configurations of the object. Given this 
set of stable configurations, we want to determine the optimal sensing directions for 
distinguishing that set of configurations. 


Example III: Simple Inspection 


The problem of determining sensing positions can also arise in simple inspection tasks. 
Suppose we are given an object pose, and a set of distinctive points defined on the object 
model. In this case, we may be able to use the techniques developed below to choose 
the sensing rays needed to test that the designated distinctive model points are in fact 
present in the sensed object. 
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Assumptions 

Thus, the problem to be addressed in this paper is finding effective and rigorous sensing 
strategies for deciding between a set of possible poses of an object, or multiple objects. 
We will assume that the following are given: 

• Set of Interpretations — Some initial set of possible interpretations is assumed given. 
This could be either from the application of some recognition process to a set of initial 
sensed points, or from assumptions about the object to be sensed, in particular 
that it is lying in one of a known number of stable positions. In each case, the 
interpretation includes a computed transformation giving the pose of the model in 
sensor coordinates. 

• Set of Sensing Directions — It is assumed that the initial sensory data were gen¬ 
erated by sampling along a set of known directions. For example, in the case of 
visual sensing these could be given by the orientation of the cameras relative to the 
workspace. In general, determining optimal sensing rays is a four degree of freedom 
problem. In this paper, we assume that the two rotational degrees of freedom are 
restricted to a small set of possibilities by the sensing geometry, such as the given 
camera orientations. We then optimize over the remaining two degrees of freedom. 

• Polyhedral Object Models — We assume that the objects to be sensed have been 
modeled as polyhedra, although the objects themselves need not be polyhedral. Any 
deviations between curved objects and their polyhedral models will simply contribute 
to a small amount of error in the sensory data, to which a recognition system should 
be insensitive. 

The goal is to disambiguate between the set of interpretations by determining positions at 
which to obtain subsequent sensory information. These positions should be such that by 
sensing along one of the possible directions, the recorded information will disambiguate 
between the set of possible interpretations (or some subset of the interpretations) in the 
presence of possible error in the computed transformations associated with each of the 
interpretations. 

In the examples given above, we assumed that we had available techniques for ac¬ 
quiring the sensory data, and techniques for solving the recognition and localization 
problem. There are, of course, many techniques for obtaining information about the 
three-dimensional positions of points on an object, as well as the local surface normals 
at those points. Typical examples of such measurement processes tactile sensing [e.g. 
Harmon 82, Hillis 82, Overton and Williams 81, Purbrick 81, Raibert and Tanner 82, 
Schneiter 82], binocular stereo [e.g. Baker and Binford 81, Barnard and Thompson 80, 
Grimson 81, 85, Marr and Poggio 79, Mayhew and Frisby 81, Ohta and Kanade 85], 
photometric stereo [e.g. Ikeuchi and Horn 79, Woodham 78, 80, 81 j, laser range-finding 
[e.g. Lewis and Johnston 77, Nitzan, Brain, and Duda 77], and structured-light systems 
[e.g. Popplestone, et al. 75, Shirai and Suwa 71]. These methods can provide information 
about the three-dimensional positions of points on the object, as well as the local surface 
normals at those points, usually with some error in the measurements. 

A number of different techniques have been developed for model-based recognition 
and localization. If one views recognition as a search for a consistent match between 



5 


data elements and model elements, then much of the variation between existing recog¬ 
nition schemes can be accounted for by the choice of what descriptive tokens to match. 
Examples of techniques relying on sparse distinctive features include the use of a few 
extended features [Perkins 78, Ballard 81], the use of one feature as a focus, with the 
search restricted to a few nearby features [Tsuji and Nakamura 75, Holland 76, Sugihara 
79, Bolles and Cain 82, Bolles, Horaud and Hannah 83], matching of high level descrip¬ 
tions [Nevatia 74, Nevatia and Binford 77, Marr and Nishihara 78, Brooks 81, Brady 82] 
and the use of geometric relationships between simple descriptors [Horn 83, Horn and 
Ikeuchi 83, Ikeuchi 83, Faugeras and Hebert 83, Gaston and Lozano-Perez 84, Grimson 
and Lozano-Perez 84, Stockman and Esteva 84, Brou 84]. The basis for the present work 
is the approach presented in [Gaston and Lozano-Perez 84, Grimson and Lozano-Perez 
84, 85a,'85b]. 

For the purposes of this paper, we will assume that such techniques are available. 
Our concentration is on the problem of choosing optimal sensing strategies for interacting 
with such techniques. 


An Algorithm For Computing Sensing Directions 


To demonstrate the approach of computing sensing directions, we first look at an example 
in two dimensions (see Figure 2), where the object has three degrees of positional freedom 
(one rotational and two translational). 



Figure 2. Two dimensional example of multiple poses. Both poses are consistent with the sensory data, 
indicated by the small surface normals and the points of contact. 


After our recognition and localization process has been applied to a sparse set of data 
points, we are left with some set of poses of the object consistent with that data. We are 
given a set of sensing directions, that is, a set of unit vectors s, indexed over i G I, such 
that sensing can occur along directions parallel to any of these unit vectors, for some set 
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of initial positions. For example, in Figure 3, if o is an offset vector, where o • s* = 0, 
then we can sense along the ray o + as^ as a varies. Equivalently, we can think of this 
as having some finite portion of a plane perpendicular to s,, such that for any point on 
the plane, we can sense along a ray through that point in the direction of s*. 



Figure 3. Examples of the sensing geometry. Each vector s defines a sensing direction. The actual 
sensing ray is defined by specifying an offset vector o relative to the origin of the sensing plane through 
which the ray must pass, parallel to s 


We are also given some bounds on the sensitivity of the sensing device in measuring 
surface normals and surface positions. In particular, we define e n and c d in the following 
manner, illustrated in Figure 4. Untrue is the surface normal at some point on an object, 
measured in sensor coordinates, and n 5eru!e is the normal measured by the sensing device, 
then 

^sense ’ true > 

If P true I s H ie actual position of a point on an object, measured in sensor coordinates, 
and p sen3e is the position measured by the sensing device, then 

'P.?ense Ptruel ^ d ■ 

Thus, e n and e d describe the range of uncertainty in the measurements of normals and 
distances, respectively. 

The basic idea is that over the set of all given sensing directions {Sj|? € /}, we want 
to find a particular direction s l(l , and an offset position o, such that sensing along the ray 
o + asi,, will distinguish the poses. By distinguish, we mean that for all pairs of possible 
poses, either the difference in the expected normals of the faces that intersect the ray, or 
the difference in the expected positions of the points of intersection of the ray with the 
corresponding faces of the poses, is greater than the sensitivity of the sensing device. 

We note that a sensing ray which does not intersect exactly one of the possible poses 
is acceptable. Indeed, in the case of two possible poses, sensing rays that would contact 
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Figure 4. Error bounds. The true surface normal is known to lie within a specified cone of the measured 
normal, while the true position is know'n to lie within a specified ball about the measured position. 


only one of the poses are likely to be among the best candidates for disambiguating 
the two poses. Secondly, we note that if there are many possible poses, it may not be 
possible to find one sensing ray that will distinguish between all of them. Instead, we 
may have to use a series of measurements to determine the correct pose. The number of 
such measurements will be bounded above by the number of poses, however. 

The main problem to be faced in finding good sensing rays is the existence of error in 
the computed transformations associated with each pose. Thus, for the sensing strategy 
to be effective, the ray must both distinguish the poses, and be insensitive to errors in 
the position and orientation of the poses. 

The proposed method is quite simple and is illustrated in Figure 5, in which two 
poses of the object are shown, one in solid lines, the other in hashed lines. The steps of 
the method are as follows. 

1. Pick a particular sensing direction s (we will assume the convention that s points 
from the sensor towards the object). In the two-dimensional case, we can define a 
line perpendicular to the sensing direction, which we will call the sensing line, with 
origin at the point on the line closest to the origin of the sensor space. In three 
dimensions, this would be a sensing plane. This is shown in Figure 5a. 

2. We fix the position of this line at some arbitrary reference point, for example by 
specifying the minimum distance of the line from the origin of the space to be d. 
This is shown in Figure 5b. 

3. Now consider one of the poses, for example, the one shown in hashed lines in the 
figure. For each face f r in the model, with corresponding model unit normal nwe 
let s nrn,i denote the unit normal rotated into sensor coordinates, i.e. corresponding 
to the orientation of the face relative to the pose of the object. If the face points 
towards the sensor (s ■ ! ii m 0), we project the boundaries of the face onto the 
sensing line, as shown for example in Figure 5c. In other words, each end point 
e of the edge is projected to a point on the sensing line, e + (d — e • s)s. In three 
dimensions, this would entail the projection of the edges of a face onto the sensing 
plane. 

4. We can label the resulting segment of the s-line with the surface normal s n rri) i and 
with the range of distances from the object face to the s-line. That is, if v is a point 
on the edge, in sensor coordinates, then v • s — d is the distance from the point to 
the s-line. We let o m j n and a max denote the extreme values taken on by v • s — d 
as v ranges over the edge, and the segment is labeled by 

| n-m,i, | a min,i’ amax >» j j • 




Figure 5. Projection of Poses onto the Sensing Line. In part (a) the sensing line is indicated, orthogonal 
to the sensing direction. In part (b) this line is fixed at a distance d from the origin. In part (c) the 
visible faces of one of the poses are projected onto the sensing line defined by the sensing ray. This 
projection defines a partition of the sensing line. Here s is the sensing ray and d is the distance to the 
origin. In part (d), the visible faces of the second pose are also projected onto the sensing line. In part 
(e), the respective partitions are tested for distinguishability, based on differences in expected surface 
orientation and differences in expected position, and the distinguishable regions of the sensing line are 
marked. Using a sensing ray through the midpoint of either of the two marked regions would enable one 
to disambiguate the two poses, as shown in part (f), in which the expected sensory points are indicated. 
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5. When all the visible faces of a pose have been projected onto the s-line, we can 
perform hidden surface removal, to reduce the set of possibly overlapping segments 
of the s-line to a set of disjoint segments. Each segment will be labeled by the 
surface orientation of the corresponding face, in sensor coordinates, and the range 
of distances to points on the face. 

6. We can perform this operation for each pose, obtaining a different disjoint partition 
of the s-line, labeled by the appropriate surface normals and distance ranges, as 
shown in Figure 5d (slightly offset for graphical clarity). 

7. Next, we intersect the set of all such partitions. That is, we define a new partition of 
the s-line with two properties. First, each segment of this new partition lies within 
exactly one segment of each of the partitions of the s-line corresponding to a pose. 
Second, this new partition is the smallest (in terms of number of segments) such 
partition. The label associated with each segment of the new partition is the union 
of the labels of the corresponding segments of the individual partitions. 

8. This partition can now be analyzed for distinguishability. More precisely, given a 
segment of the partition, the set of normals 

OMi e J ) 

associated with that segment is distinguishable if 

max (n, • n 7 ) < 2e n . 

In other words, given a measurement of the actual object in 
uniquely determine to which pose it corresponds. Similarly, 
measurements 

{( Q min,i> a max,j) \ j e j} 

is distinguishable if 

max { |d m i n 

^3 

We can collect all such distinguishable segments of the partition, thereby determining 
the set of possible sensing points along the particular choice of s. This is illustrated 
in Figure 5(e). 

If there were no error in the transformations associated with the poses, we would be 
done, since any point in this set would disambiguate the poses, (see Figure 5(f) for an 
example). To account for possible error in the transformations associated with the poses, 
however, we need to be somewhat judicious in our choice of sensing point. The basic idea 
is to choose a point such that, the face with which contact is made remains the same over 
small perturbations in the transformation. In two dimensions this is most easily done 
by choosing the midpoint of the longest segment. In three dimensions, the easiest way 
to choose such a point, from among the set of distinguishable polygons on the sensing 
plane, is by applying the notion of a Chebychev point, defined as follows. Suppose we 
are given a polygon on a plane, each of whose edges is defined by a pair (n j,dj), where 
n j is a unit normal lying in the plane, and d 3 is a constant such that points along the 
edge are defined by 

{vjv • Aj - d, = 0} . 

Then the distance from any point v to an edge is given by 

v • n j - dj. 


this region, we can 
the set of distance 
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The Chebychev point of a polygon is the point which maximizes the minimum distance 
from the point to any edge of the polygon, that is, the point v that satisfies 

min (v • nj — d 3 ) > min (u • n 3 - dj ) Vu 
3 3 

where the value taken by this expression at the Chebychev point is called the Chebychev 
value of the polygon. Clearly, the polygon with the maximum Chebychev value will 
be the least sensitive to perturbations in the computed transformations, and thus the 
Chebychev point with the maximum Chebychev value, as measured over the set of all 
distinguishable polygons, defines the best sensing position. Note that we can improve 
the reliability of the sensing strategy even further by choosing the maximum Chebychev 
point as measured over connected sequences of distinguishable segments. 

9. We repeat this process over all sensing directions s z , choosing the direction that best 
distinguishes the feasible poses. 

While this analysis has been done in two dimensions, it clearly extends to the general 
three dimensional case. Here, the visible faces are projected into polygons on a sensing 
plane, and the intersection of the projections over all poses gives a partition of this plane, 
which can be tested for distinguishability. 


An Implementation of the Technique 

In testing the proposed algorithm, we have chosen a slightly modified implementation of 
the technique, that avoids some of the difficulties of performing hidden surface removal, 
and of intersecting polygonal partitions of a plane. One means of circumventing these 
difficulties is to use a regular grid tesselation of the plane. 

In particular, suppose that we partition the s-plane with a rectangular grid whose 
elements have sides of length h. Rather than trying to compute polygonal regions on the 
s-plane that are distinguishable, we shall examine each grid segment within the bounds 
of the projected object, seeking those segments that are themselves distinguishable, and 
then we will piece these grid elements back together. 

The steps of the new algorithm, many of which are identical to those of the previous 
solution, are sketched below. 

• Initially, mark all grid segments as active. 

• Given a pose, and a sense direction s, test each face for visibility. If the normal of 

the face, in sensor coordinates, is given by then a face is visible if s- s n m ^ < 0. 

• For each visible face, project its vertices onto the s-plane, resulting in a set of new 
vertices that define a polygon on the plane. 

• Given this polygon on the sensing plane, compute the smallest bounding rectan¬ 
gle composed of an integral number of grid elements which encloses the enscribed 
polygon. This rectangle has no intrinsic merit, but is simply a convenient means of 
restricting the search process. 

• For each grid element lying in this enclosing rectangle, apply the following test. If 
the grid segment lies entirely outside of the polygon, nothing is done. If some edge of 
the polygon passes through the segment, this segment is marked as inactive. If the 
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grid segment is still active and lies entirely within the polygon, a label is attached 
to the grid segment. This label is composed of two elements. The first is the normal 
of the face whose projection resulted in the current polygon on the sensing plane, 
measured in sensor coordinates. The second is the range of possible positions that 
could be achieved by intersecting a sensing ray passing through a point in this grid 
segment with the face of the underlying interpretation. If the vector o, lying in the 
s-plane, defines the midpoint of the grid segment, this range is given by 

y/2 h 

an ±-tan 6 

2 

where «o is the value of a for which the ray o + as intersects the face of the pose, h 
is the size of the grid segment, 6 is given by cos# — s n • s, and s ii is the normal of 
the face in sensor coordinates. 

• Repeat this process for all visible faces. This results in a set of active grid segments, 
each of which is labeled by possibly several labels of the type described above. 
This set of labeled active grid segments represents the equivalent of the partition 
of the sensing plane described in the ideal solution. Note that we have avoided the 
hidden surface problem by incorporating multiple labels for a grid segment, from a 
single pose. This may reduce the number of distinguishable segments, by applying 
additional constraints on the criteria of distinguishability, but it also greatly reduces 
the computational expense of the process. 

• Once a partition of the grid is obtained for each pose, test the grid segments for dis¬ 
tinguishability. First, only grid segments that are active in all poses are considered. 
Such a segment is considered distinguishable if for all pairs of sets of labels, either 
all the face normals of one label are distinguishable from all the face normals of the 
other (in the sense defined in the previous section), or all the distance ranges of one 
label are distinguishable from all the distance ranges of the other (also in the sense 
defined in the previous section). 

• Finally, collect the set of distinguishable grid segments into convex connected com¬ 
ponents. 

• Compute the best sensing position as the center of the largest square (wdth sides 
an integral number of grid segments) that can be placed entirely within the set 
of distinguishable grid segments. Note that if the square has sides of size s then 
the Chebychev value for the segment is at least s/2. This process can be repeated 
over all sensing directions, and the midpoint of the largest such connected convex 
collection of distinguishable grid segments can be used to define the best sensing 
position. To save on computation, it is also possible to define a minimum size for 
an acceptable convex connected component, and to only apply this process until the 
first such acceptable component is obtained. 

In Figure 6, w'e illustrate the above technique on the multiple poses of Figure 1. Note 
that each of the small circles denotes a point on the grid of the sensing plane that is 
distinguishable. We can then determine the best sensing position by finding the largest 
square area filled by such distinguishable points. The figure illustrates the computation 
for each of three different sensing rays. 




Figure 6. Example of distinguishable points. For the multiple poses of Figure 1, we show three different 
sensing plane projections. The small circles mark positions, on the defined grid, that are distinguishable 
in these poses. 
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Bounds on Transform Errors 


In order to use such an algorithm, we need to determine values for two parameters. First, 
errors in the computed transformation associated with a pose will affect the threshold 
needed to determine distinguishability. For example, if there is no error in the computed 
transformation, then two surface normals are distinguishable if the angle between them 
exceeds the range of error in measuring such normals. When error is present in the 
transformation, its effect on the expected surface normals must be added to this thresh¬ 
old, thereby reducing the set of distinguishable normals. Second, we need a bound on 
the minimum Chebychev value (or its approximation) such that errors in the computed 
transformation will not affect the expected values of the sensor along the sensing ray. 
In order to deal with these parameters, in this section we will derive theoretical bounds 
on the possible errors in the computed transformations. In doing so, we will also derive 
criteria that can be imposed on the computation of the transformation from model coor¬ 
dinates to sensor coordinates in order to reduce the range of possible error. Depending 
on the sensor data available, it may not always be possible to satisfy these criteria, in 
which case higher possible errors will have to be tolerated. 

Computing the Transform 

There are many different methods for determining the transformation from model co¬ 
ordinates to sensor coordinates, and the errors associated with that computation will 
clearly be dependent on the specific method. To illustrate the disambiguation technique 
developed here, we choose one particular scheme, and derive specific error bounds on 
the model transformation for that scheme. This will then allow' us to actually test our 
disambiguation algorithm. We being by reviewing the process used in [Crimson and 
Lozano-Perez 84] for computing the transformation from model coordinates to sensor 
coordinates. 

W’e are given a set of possible poses of the sensed data, each one consisting of a set of 
triples (p. ; ,n,, /,•), where Pj is the vector representing the sensed position, n, is the vector 
representing the sensed normal, and /., is the face assigned to this sensed data for that 
particular pose. We w'ant to determine the actual transformation from model coordinates 
to sensor coordinates, corresponding to the pose [see also Crimson and Lozano-Perez 84 . 

We assume that a vector in the model coordinate system is transformed into a vector 
in the sensor coordinate system by the follow'ing transformation: 

v s = Rv m + v 0 

where R is a rotation matrix, and v 0 is some translation vector. We need to solve for R 
and v 0 . 


Rotation Component 

Suppose n m ,i is the unit normal, in model coordinates, of face fi, and n Sil - is the corre¬ 
sponding unit normal in sensor coordinates. Given a two such pairs of model and sensor 
normals, an estimate of the direction of rotation v l3 such that a rotation about that 
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direction would take n m ,i into n Sjt is given by the unit vector in the direction of 

~ As,j) X (Atnj — ^ ,] ) ■ 


If there were no error in the sensed normals, we w'ould be done. With error included 
in the measurements, however, the computed rotation direction r could be slightly wrong. 
One way to reduce the effect of this error is to compute all possible as i and j vary 
over the faces of the pose, and then cluster these computed directions to determine a 
value for the direction of rotation r. 


Once we have computed a direction of rotation r, we need to determine the angle 9 
of rotation about it. This is given by 

1 ~ ‘ ^m,i) 


COS 9 


sin 9 


1 - (r 4 Si i)(f • n m>2 ) 
(r x n s>l ) • n mti 


( 1 ) 


1 - (r • n Si i)(r • n m ,t) 

Hence, given r, we can solve for 9. Note that if sin 9 is zero, there is a singularity in 
determining 9, which could be either 0 or tt. In this case, however, r lies in the plane 
spanned by and n m> ; and hence, only the 9 — n solution is valid. 

As before, in the presence of error, we may want to cluster the r vectors, and then 
take the average of the computed values of 6 over this cluster. 


Finally, given values for both r and 9, we can determine the rotation matrix R. Let 
r x , r y , r z denote the components r. Then 



■1 

0 

0- 


’ r 2 x r x r y r x r~ 


1 

o 

1 

f -ce 

R = cos 9 

0 

1 

0 

+ (1 — COS0) 

r y r x r 2 y r y r z 

+ sin 9 

r z 0 —r x 


.0 

0 

1. 


r z r x r z r y r 2 z _ 


-1 

o 

H 

1 

_1 


Note that in computing the rotation component of the transformation, we have 
ignored the ambiguity inherent in the computation. That is, there are two solutions to 
the problem, (r, 9) and (-r,-0). We assume that a simple convention concerning the 
sign of the rotation is used to choose one of the two solutions. 


Translation Component 

Next, we need to solve for the translation component of the transformation. Suppose 
we consider three triplets from the pose, (p si ,n S) j, } l ), (p -,n St y, fj), and (p fc ,n Si fc, f, t) 
such that the triple product n mjJ • (n mt j x n mi k) is non-zero, (i.e. the three face nor¬ 
mals are independent). Then, it can be shown that the translation component of the 
transformation, vq, is given by 

[n m,i ■ (flmj X D m ,t)l v o = ■ P s ,» - di) {n Stj x n Sik ) 

+ (n<>,j • p s j - dj ) (n Si k x n Sj i) 

+ (n s<k ■ p s>k - d. k ) (n. S)J x n s j) 

As in the case of rotation, if there is no error in the measurements, then we are done. 
The simplest means of attempting to reduce the effects of error on the computation is to 
average vq over all possible trios of triplets from the pose. 
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Errors in the computed transformation 

We now consider possible errors in each of the parameters of the transformation, as a 
function of error in the sensor measurements. The results are summarized below, more 
explicit details may be found in the appendix. 

Errors in r. 

Let n m ,i be the unit normal of face f t , in model coordinates, let n' mi be the associated 
unit normal transformed into sensor coordinates, and let n Sj i be the actual measured 
unit normal, in sensor coordinates. Suppose that the sensitivity of the measuring device 
was e n , that is, 

® m,i ' > € n- 

Then an absolute bound on the possible error in the computed value for the direction of 
rotation, r c , in relation to the true direction of rotation, r t , is given by 



where 


8 , 

Note that if 7 t is close to e n , then the error bound becomes increasingly large. This 
is to be expected, since in this case, n mi i « and thus small errors in the position 
of n can lead to large errors in the position of r. Similarly, if rj is near 1, large errors 
can also result. If we restrict our computation (where possible) to cases where 7 * and 7 
are small, then we have an approximate bound on the error in computing the direction 
of rotation given by 

it -f c > e n . 

This bound is supported by the results of the simulations reported in [Grimson and 
Lozano-Perez 84], 



Errors in 9. 


We know that the angle of rotation 9 is given by 

(r x n m ) • (r x n m ) 

where n m is the unit normal of a face in model coordinates, n ; m is the corresponding 
normal transformed into sensor coordinates, and r is the direction of rotation. 

If we let r t denote the true direction of rotation, r c denote the computed direction 
of rotation, and n 5 denote the measured surface normal corresponding to n' m , then the 
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constraints on the error in computing the angle 9 are that r t ■ r c > cos xp and • n s > 

e n = cos 4>. In the appendix we show that the correct value for 9 is given by 

1 cos a 


tan 9+ = 


sin v cos 0 


where 


cos v — n m ■ r t 


cos a = 


cos 0 



r t x n 



|r t x n m! 

Furthermore, we show in the appendix that the worst case for the computed value of 9 
is given by 

1 cos (a - [4> + £]) 


where 


tan 9 C — —' 

sin oj cos (p + [ip + 7 j) 

cos u> = cos (<p + xp) — (1 — cos v) cos <f> cos ip 
cos ^ = cos \p\ 


cos 7 = cos (p cos ip 



1 — cos 2 , 


cos 2 xp cos 2 V 
1 — cos 2 V 


1 ~ cos 2 U) 

We could use these expressions to derive bounds on the possible variation in 9 as a 
function of <p and xp, but this is a rather messy task. Instead, we show in the appendix 
that if <p and xp are small, then an estimate for A# such that 

tan ( 9 t + A0) « tan 9 C 

is given by 

jA#j « \<p + xp\. 

This bound is supported by the results of simulations reported in [Grimson and Lozano- 
Perez 84], 


Errors in Rv 

We have computed expressions for the possible error in r and 6. In particular, we will 
denote the error in 9 by A 9 and the vector error in r by Sr such that r ■ Sr = 0. We now 
consider the problem of estimating bounds on the possible error in applying the computed 
rotation matrix to an arbitrary vector v. We know that the rotational component of the 
transformation of v is given by 

R (r, 9) v = cos 0v + (1 - cos 9) (r • v) r + sin 9 (r x v) 
where r and 9 are the parameters determining the rotation. 

We show in the appendix that if we ignore higher order terms, a Taylor series 
expansion yields the following bound on errors in the computed value of a rotation: 

\R (r + Sr, 9 + A 9)v - R (r, 9) v] < (2 j<5r| + |A0|) jvj . 

Now, if the errors <p and xp are small, then we know that 

|A0| < \<p + xp\. 
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Furthermore, 


|<5r| = |sin ip\ & \if>\ 

and this implies a bound on variation in v of 

|3+ <j>\ |vj . 

Moreover, if we are careful to restrict our computation appropriately, then & <f>, and 
thus 


R (r + Sr, 6 + A 9) v - R (r , 9)' < \4(f>\ |v 


Effective bounds on rotation errors 

Unfortunately, this is still a fairly weak bound. For example, an error cone of radius 
about the measured surface normals would give rise to potential errors in the computed 
rotation on the order of the magnitude of the rotated vector. This is obviated to a 
large extent by the fact that we do not rely on a single measurement in computing the 
transformation parameters. Rather, we use several sets of measurements, and use the 
mean value as the result when computing r and 9. 

To see how this helps reduce the effective bound, consider the following argument. 
Suppose that the error in computing 9 is uniformly distributed over the range [-2^, 2<p;. 
If we take n measurements and average, then the distribution of error about the correct 
value 9 C should approach a normal distribution, by the Central Limit Theorem. If we 
assume a uniform distribution for the error in each measurement, then the variance in 
the error can be shown to equal 

4(f> 2 

IT' 

If there is no systematic error in the measurements, i.e. each measurement error can be 
considered independent of the others, then the distribution of average error is essentially 
a zero-mean normal distribution with variance 

4(f) 2 
3 n 

and hence with standard deviation _ 

[ 44> 2 
V 3n ' 

Similarly, if the magnitude of the error vector, Sr, associated with the computation of 
the direction of rotation, r, is uniformly distributed over its possible range, and the 
measurements are independent, then the distribution of error in 2 |<5r| is given by an 
identical normal distribution, since the maximum error in |<5r j is essentially (f>. By linearly 
combining the two distributions, the error in the computation of f?v is given by a zero- 
mean normal distribution with variance 

8<f 2 
3 n 

While an absolute bound on the error in computing Rv is given by 4 \<j>\ |v|, tighter, 
but less certain, bounds are possible. For example, if we impose a 0.95 probability that 
the error does not exceed the bound, then an expression for this bound is given by the 
normal distribution error function, and in this particular case, by 

3.92\/2 J 
<P■ 
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As the number of samples n increases, this bound becomes increasingly tighter. 

Note that while we have assumed a uniform distribution of the errors in the individ¬ 
ual measurements, this is not a critical assumption. Since we are only seeking estimates 
for the bounds in computational error, other distributions will give similar results. 

In summary, given some lower bound on the number of samples to be used in com¬ 
puting the transformation from model coordinates to sensor coordinates, and given that 
the assumption of small errors in the measurements of surface orientation holds, then the 
error in the computed rotation of a vector v is given by a zero-mean normal distribution, 
scaled by the magnitude jv|, with standard deviation 

<t> 

where n is the number of measurement samples and 4> is the angle of maximum error in 
the measurement of surface orientation at each sample point. 


V 3 n 


Errors in vq. 


We know that the translation component of the transformation is given by 

= (^m,t ' Ps,i ~ ^i) (** m,j X ^ m,k ) 

A ' Ps,j ~ dj) (n mifc X 

+ {n'm,k ■ P s,k * dk) X n' mj ) 

where n m ,i is a face normal in model coordinates, h ' m ■ is the corresponding face normal 
transformed into sensor coordinates, p S]! is the position vector of the contact point in 
sensor coordinates, and d % is the constant offset for face i. 

If the error in measured surface normals is given by e n = cos $ such that rl^-n^ > e n 
and the error in measured contact positions is bounded in magnitude by ej, then the 
error in each component 

( j^m,k P ■■. k ~ dkj (b rn , i ' ^m.j) 

is bounded in magnitude by 


\j is sin f — (s + A) sin (f - 2 <p)} 2 + (s + A) 2 sin (2f) sin [4cf] 


where 


s — “U,* ' Ps.fc ~ dk 
A < e,i + |p f jfc j V2\/l -- ( r 


cos f = n 


m,i ' 


If we restrict our computation to cases in which the faces are nearly orthogonal, 
then this bound on the components of the translation vector reduces to 

|s — (s + A) cos (2d>)| . 
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Choosing the Parameters 


We now consider deriving a formal definition of distinguishability as applied to surface 
normals and to distances. Consider first the case of distinguishing poses on the basis of 
measured surface normals. Suppose that a denotes the angle between surface normals 
associated with two different possible poses. What is the minimum size of a needed to 
distinguish these poses? 

Clearly, the expected normals must differ by an amount that is bigger than the 
sensitivity of the measurements themselves. Thus, a must at least exceed 2 cos^ 1 e n = 2 <f>. 
Since there is also some error associated with the computed transformations associated 
with each pose, the angle must also exceed this error. By the previous analysis, if we use 
a single measurement to determine the rotation matrix R, then this error is bounded by 
\4<j>\ and hence, we have the bound 

a > 2<fi + 2(4<t>) = 10 <f>. 

For most values of <f>, this bound is far too large to be of much use. 

If we use several measurements to compute R, however, then more effective bounds 
can be used. As shown previously, assuming no systematic error implies that the error in 
the computed surface normal associated with a transformed face is given by a zero-mean 
normal distribution with standard deviation 



where n is the number of measurement samples. 

This gives us a tighter definition of distinguishable surface normals. In particular, 
if a denotes the angle between surface normals associated with two distinct poses, those 
poses are distinguishable if 

a > 2(f> 4 2 p6. 

The first term denotes the range of possible error in the measurement of the surface 
normals, and the second term denotes the range of possible error in the expected values 
of the surface normals. Here, p is a scale factor that is a function of the reliability of the 
error bound. That is, p(c) denotes the point in the normal distribution described above 
such that c percent of the weight of the distribution lies below the value p. 

For example, if the cutoff on the reliability of the bound is 0.95, and the number 
of measurements involved in computing the transformation is at least 10, then p < 1.01 
and thus the bound on two surface normals being distinguishable is 

4.02^. 

We can also derive a formal definition of distinguishability based on position mea¬ 
surements. We first note that if a face is defined by the pair (n m , d ) in model coordinates, 
such that a point v lies on the plane of the face if 

v ■ n m - d = 0 

then the same face, after transformation, is defined by the pair (nwhere 

n'm = Rn m 
d' = d 4- (v o • Rn m ) ■ 
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We let the error associated with the computed value of be denoted by w such 
that w • n' m = 0. The magnitude of w can be bounded above by 

|sin , 

by the discussion above. We also let the error in computing Vo be denoted by u. We now 
seek a bound on the possible errors in computing the point of intersection of a sensing 
ray with an object face, due to errors in the computed transformation. 

Suppose that the sensing ray is given by as + o where s is a specified unit vector, o 
is a specified offset vector orthogonal to s, and a is the free parameter specifying position 
along the sensing ray. The correct parameter of intersection of the sensing ray with the 
transformed face is given by the value of a such that 

(as + o) • ri^ - d' = 0 


or 


d' — o • n' r 


<*t = 


■ i 


s • n. 


On the other hand, if we include the potential error in the computed transformation, 
then the point of intersection is given by 

[d 1 + u • (n^ + w) + v 0 ■ w] - o • (n' m + w) 
s • (n^ + w) 

and thus the difference is given by 

o) • w s • w 


a r 


r = 


u-n m + (u + v 0 


s•n m + s•w 


a t . 


As a consequence, we can bound the error in the expected intersection point of the 
sensing ray with the face by 


r < 


|u| + (|u| + jvo - oj) |wj 


s • n... - w 


+ 


w 


- - / 

1 1 

s ' n m - 

- jw| 


Of, 


where 


w is the error in computing n m = Rn m 
at is the predicted intersection point 
u is the error in vq 
s is the sensing direction 
o is the sensing offset vector 
n' m is the face normal in transformed coordinates 
Vo is the computed translation. 

Thus, given this bound, two poses are distinguishable if t heir expected points of intersec¬ 
tion are large enough, 

\ a i - Ck 2 1 > 2 €d + rx + r 2 . 


As in the case of distinguishing on the basis of surface normals, the bounds for w and u 
may be too large to be practical. We can reduce these bounds by using several measure¬ 
ments to determine a value for vq. As in the previous case, this will lead to a zero-mean 
normal distribution of expected error, and the effective range of error will be reduced. 

Finally, we need to place bounds on the minimum Chebychev values needed to 
guarantee that perturbations in the computed transforma! ions will not cause the sensing 
ray to miss the intended face. Let c, denote the Chebychev value associated with a 
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particular face, whose transformed unit surface normal is and where the unit sensing 
ray is given by s. Then the modified Chebychev value in the sensing plane is given by 
c jn^ • s|. At the same time, the variation in the position of a point on the face, as a 
function of error in the computed transformation, is given by 

Ap = u + q 

where, as above, u denotes the error in the computed value of Vq and q is the error in 
the computed value of Rp, where p is the Chebychev point in model coordinates. The 
magnitudes of these error vectors are bounded by the expressions derived above. Since 
the directions of the error vectors are arbitrary, the condition on the Chebychev value 
required to ensure contact with the face is 

c > iHl + l a! . 

Thus, we have derived conditions on the parameters of the disambiguation algorithm 
needed to guarantee the performance of the algorithm. 


Discussion and Examples 

When Do We Compute the Sensing Directions? 

We have described a technique for determining optimal sensing directions. We have still 
to consider, however, how to interface such a technique with the general problem of 
recognition and localization. The simplest method is to obtain some initial set of sensory 
data points, apply our recognition technique, and then use the disambiguation process as 
required, based on the current set of consistent poses. For example, if there are several 
consistent poses, we could choose the first pair, compute an optimal sensing direction 
based on that pair and obtain a new data point. Then, we could determine which of 
the set of poses are also consistent with the new data point and iterate. This technique, 
while applicable to arbitrary sets of objects, has the disadvantage of high computational 
expense. 

In situations in which a large number of objects are possible, we may not be able to do 
any better than to compute sensing points as needed, based on the current set of feasible 
poses. In situations involving a single object, however, there may be an alternative 
method for integrating the computation of sensing positions with the interpretation of 
the sensory data. 

In particular, given the analysis developed here, one can precompute optimal sensing 
rays as a function of the difference in transformation associated with two poses. Take 
any pair of poses of an object. There exists a rigid transformation taking one pose into 
the other, which we can parameterize in some fashion. We compute the optimal sensing 
direction for this pair of poses, and insert it into a lookup table, whose dimensions are 
indexed by the parameters of the relative transformation. Since the workspace of the 
sensory system is bounded, this is a bounded table (that is, the translational degrees 
of freedom are not infinite in extent). The analysis can be used to compute an opti¬ 
mal sensing ray corresponding to each entry of the table, where the parameters of the 
transformation are quantized to some desired level. 
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Now, when attempting to disambiguate two possible poses, one simply computes 
the difference in the transformations, looks up of the precomputed sensing ray in the 
appropriate slot of the table, transforms that ray by the transformation associated with 
the first pose, and then senses along that ray to obtain an new data point. That data 
point is added to the current set of sensory data, and the recognition and localization 
process is applied. If a unique pose results, the process is stopped; if not, a new sensing 
ray is obtained and the process continues. 

By precomputing the sensing rays, w r e can avoid the computational expense associ¬ 
ated with finding a new sensing position, and at the same time take advantage of the 
efficiency of the technique is disambiguating multiple poses. 

Avoiding False Negatives 

We have seen in the previous discussion that the analytic error bounds on the computed 
transformations for any pose are probably too large to be practical. We argued that one 
way to reduce these bounds was to use several measurements in the computation of the 
transformation. This led to a normal distribution of error in each of the components of 
the transformation, and thus, given a level of desired confidence in the algorithm, tighter 
bounds on the parameters were possible. In this case, we would expect that in general 
the algorithm will succeed, and we need only consider alterations to the algorithm to 
deal with the infrequent case when the errors in the computed transformation do exceed 
the expected thresholds. There are two situations that can arise in this case. The first 
is that the perturbation in the transform causes a surface normal to be sensed, that 
does not agree with any of the expected normals. This is essentially a false negative, 
since it implies that the poses are not distinguishable. The more damaging case is a false 
positive, in which the perturbation in the transformation results in a sensor measurement 
that coincidentally agrees with the wrong pose. 

The easiest solution is to use more than one sense point. In this manner, false 
negatives are easily handled, since the expectation is that not all sensed points will give 
inconsistent data. This will be especially true if several sensing directions are used, in 
particular if the sensing directions are orthogonal. As well, it is likely that false positives 
can also be detected, since the expectation is that the correct pose will be found by most 
sensor points, again especially if several directions are used, and a simple voting scheme 
will arrive at the correct answer. 


Testing the Algorithm 

We have implemented the described technique, and tested it on a number of examples. 
Because the worst case bounds are so large, we used the approximations described above, 
with the expectation that on occasion an incorrect decision would be made, but that such 
errors could be avoided by voting over several additional sensing points. 

In particular, we ran the algorithm described in [Grimson and Lozano-Perez 84j for 
an object in arbitrary orientation relative to the sensors and with simulated sensing from 
three orthogonal directions. Whenever there was an ambiguity in interpreting the sensed 
data, we used the following disambiguation technique. We used the analysis developed 
above to predict a sensing ray, and for each pose we predicted ranges of expected values 
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for the sensory data along that ray. We then acquired an additional sense point along 
the chosen sensing ray, and compared the recorded value with the expected ranges to 
choose a pose. 

Using a variety of simulated sensing errors, the disambiguation technique was applied 
to 1000 ambiguous cases. It was found in 336 of these cases that, due to the large 
errors inherent in the sensory data, the algorithm could not distinguish reliably between 
the possible solutions. In all of these cases, the poses differed by the reassignment of 
one data point from one face to an adjacent face, and this resulted in nearly identical 
transformations associated with the poses. Relative to the error resolution of the sensing 
devices, these can be considered to be identical solutions. In 633 of the cases, the 
disambiguation algorithm was able to determine the correct pose with only a single 
additional sensory point. In the remaining 31 cases, the algorithm chose an incorrect 
pose from the set of consistent poses. 

We also ran a second version of the disambiguation algorithm on the same set of 
data. In this case, rather than using predicted range of values to choose a pose, we 
simply used the technique to generate the next sensing direction, and then ran the RAF 
recognition algorithm [Grimson and Lozano-Perez 84, 85a; with that sensory point added 
to the original set of sensory data. In this case, we found that the algorithm identified 
the correct pose in all 664 cases, with only from 1 to 3 additional sensory points required 
to complete the identification. 
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Appendix 


In the appendix, we present a more detailed error analysis of the computation of the 
transformation from model to sensor coordinates. 


Errors in r 

We begin by considering the range of possible errors in the computation of the direction 
of rotation, r. By the analysis of [Grimson and Lozano-Perez 84], the rotation direction 
r is computed by taking two pairs (n m .n m ). where n m is the unit normal of a face of 
the model, and is the same unit normal rotated into sensor coordinates, and letting 
r be the unit vector in the direction of 

C“m,» - n'm,i) Y ~ 

We assume that we are given n rrit i,n' rni and that the sensitivity of the sensor to 
errors in surface orientation is given by e n . That is, if is the correct surface normal 
transformed into sensor coordinates and rG is the actual measured (or sensed) surface 
normal, then 

(4) 

We will consider two stages in deriving bounds on the error in computing r. If we 
let 

** 771,1 


V, = 


1*1 771,1 


* f 
'l 

4 771,1 

iV 

xx m,'i 


and 


W = 

then the correct value for r is given by 

it - - 

and the computed value is given by 

r c - 


ln r 


ric 


V, XV, 


\/l - (Vi -Vj)' 


U, X u , 


“ (w- • Uj ) 2 


We will first derive bounds on v, • u, and then use the result to bound r t r c . 



The vector n s can be represented by the following parameterization 

n a = ah' m + 0n m + 6(n m x n' m ). 

Then equation (4) produces the inequality 

a + 0 7 > (4') 

where 7 = n m • ri' m . We will consider the worst case, in which equality holds. Further¬ 
more, the fact that is a unit vector yields the following constraint 

a 2 + 2a0i + 0 2 + S 2 = 1. (5) 

Given and n,, we first consider the range of possible values for n m ~ n g , 

relative to n m — n' m , that is, we want bounds on the range of possible values for 

m (“m - O ‘ {^m ~ A3) 

e _ v -u - -j2 -rnw- 

I n m n m j |n m n 4 . | 

It is straightforward to show that 

|n m -n' m \ = \/2(l -~t). 

Furthermore, using equation (5), one can show that 

|Am - n s \ = y/2(l- 0 - 07 ). 

Finally, expanding out the dot product and substituting yields 

E = (1 -/? + a)(l - 7 ) 

2y/T- 7 \/l - ~0 - «7 

By equation (4'), a = e n - /? 7 , and substitution yields 

E _ y/1 ~ 7 [1 + c* ~ /?(! + 7)] 

2 v /i - e n 7 - /?(1 - 7 2 ) 

The first problem to consider is what is the minimum value for E as (3 varies. In 
particular, we find that 

dE _ x/T^ (1 + 7 ) 2 [0(1 - 7 ) - (1 - i n )] 

4 (1 ~e n7 -0(l - 7 2 )) 1 ' 

This is zero when 

f> = ~ ( 6 ) 

1 - 7 

and this is a valid value for 0 provided 7 < e n . Taking a second partial derivative of E, 
we find that the sign of d 2 E/d0 2 is given by the sign of 

0 (l - 7 2 ) + 3 (e n - 7 ) + e „7 - 1. 

Substituting equation ( 6 ), we find that the sign of the second partial derivative is given 
by the sign of 

2 (e n - 7 ) 

and this is positive, since 7 < e n . Hence, E achieves a minimum at the value of (8 given 
by equation ( 6 ), and this value is 

E =\ /r^- < 47 

If 7 > e n , then the minimum value for E occurs for 0 at the limit of its range, 
namely 0=1. In this case, E < 0 , and is minimized when 7 = V T^, taking the value 
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In general, we will try to restrict the computation of r to those cases in which 7 < e n , 
in order to keep the magnitude of the possible error in r small. Note that the minimum 
value for E is monotonic in 7 , that is, the minimum E increases as 7 decreases towards 
- 1 . 


Thus, we obtain the bound 


u 


Vi > 



= 


where 

7i = (nm.,i ' X*m,i) • 

We are now' interested in obtaining bounds on r t • r c . In essence, we have two cones 
in the Gaussian sphere, centered about vy and vy, of radius S % and 8j respectively. The 
possible values of r c are given by the normalized cross products of vectors within these 
cones. Clearly, if the cones overlap, then the computation for r is unstable. W 7 e avoid 
this case by requiring that the cones do not overlap. 

Note that if all the error in the computation of either u, or u y lies in the plane 
spanned by vy and Vy, then the normalization of the cross product will result in the 
correct value r t = r c . Clearly the maximum deviation of r e from r t will occur when the 
error between u, and v, and the error between u y and vy lie maximally separated from 
this plane. This requires that w r e check two cases, one in which the errors lie on the same 
side of the plane and one in which the errors lie on opposite sides of the plane. We now' 
consider the first case. 


Let r] ;= vy • v j. Then 


u t = SiVi + 


1 - 6} 


\ 1 ~ T 


(■Vi X Vy) 


11 - 8f 

x ^ 


1 - 


1 - 


Thus 

and 


Uy X uy = S 2 (vy x vy) + Sj^j —i(v, x vy) x vyj + Si y —(vy x (v,: x vy)j . 

(Vi x Vy) • (u, X Uy) = S t 8j [l - T] 2 ) 

-J — '/ T \J 


u l • uy = SiSji) + v /l - Sfyjl - 6 2 . 


Then, by substitution, 
f t -r c > 


SiSiJT^ 7 2 


\p- {Hifn + \7^? V /C7|} 2 


In the second case, we change to 


u j = Cv 


J ■ V 


1 - 8 2 


(8) 
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and thus 


1 - 5} 


'l -6* 

o r,-. 


U, X Uj = SjSi (v; X Vj) + [(v, X Vj) X \j] - [v* X (v, X V,-)] 

Following through the same algebra leads to a bound on the dot product of 


ft-r c > 


Mj\/ 1 - T]‘ 


y 1 - {Mjr? - v 7 ! 3 S j } 


(9) 


where 


n = vi • v. 


V, = 


n m,i n m.,i 


Si > 



'"{i — Ylm,i 


n 


/ 

m,i ’ 


It is straightforward to show that the bound in equation (9) is in fact smaller than 
the one in equation ( 8 ). 

Note that if 7 * is close to 1, then the error bound comes increasingly large. This is 
to be expected, since in this case, n m ,i & n'mi an ^ thus small errors in the position of 
n' m can lead to large errors in the position of r. Similarly, if r] is near 1 , large errors can 
also result. If we restrict our computation (where possible) to cases where 7 ; and rj are 
small, then we have an approximate bound on the error in computing the direction of 
rotation given by 

i’t -?c > (?,- 


This bound is supported by the results of the simulations reported in [Grimson and 
Lozano-Perez 84]. 


Errors in 9 

We now want to consider bounds on the possible error in computing the remaining 
parameter of the rotation component of the transformation, namely, the angle of rotation 
9. Given the expressions in equation (1) for cos 6 and sin 9, the value of 9 is given by 

t , _ _ • (r X n m ) _ 

(r x n m ) • (r x n m ) 

where n TO is the unit normal of a face in model coordinates and n' m is the corresponding 
normal transformed into sensor coordinates. 

As in the previous section, we let r t denote the true direct ion of rotation and r ,, the 
computed direction of rotation. We will assume that the error in the computed direction 
of rotation is bounded by 

ft • f c > 5 r 

and that the measured value for n m is given by n., such that 

n'm ■ n „ > €„. 



30 


We shall make use of the following four unit vectors: 

r t x n m 

W = 77 - 7 —r 

|r t x n m \ 
r c x n m 

U = 77 -;-r 

|r c X n m | 

s = 

J 

t = 

Given these definitions, it is straightforward to show that 


rt x n r 

ft x n' 


f c * Am 
f c x n. m 


tan 6t = 


-1 

ft x n' m 


• w) 

(s -w) 


tan 6 C 


-1 _ (n 8 • u) 

|f a X n.,' (t • u) 


Our method in obtaining bounds on the deviation between these two expressions 
will be to bound n s • u as a function of n m ■ w, and to bound t • u as a function of s ■ w. 
Once we have bounds on these expressions, they can be combined to bound the overall 
expression for tan0. 

First, we consider the range of values for 

„ - (fc X n m ) (ft x n m ) 

U • W = -77 - 7 - - • 77 - 7 -f . 

|r c x n m | jr< x n m | 

In considering the range of values for u ■ w we note that because of the normalization 
of the vectors, any error in f c lying in the ft - n m plane will have no effect on the dot 
product. Thus, the worst case occurs when all of the error lies perpendicular to this 
plane. Hence, we need only consider the cases where 

fc = aft ± P (ft x n m ), 


COS V 


where 

a > S r 

1 = a 2 + /? 2 (l 
cos v = f t ■ n m . 

Now, the worst case will occur for a = S r , in which case, 

1-6 2 




1 - COS 2 u 


so that the worst case will arise for 


- r - 4 - r 1 - 6 - 

r t = Sr r, ± 7/ J - 


cos 2 V 


(ft x n m ) . 


In this case, the following expressions hold: 


f c x n m - S r (ft x n m ) ± y — 

(f C. X n m ) ■ (f c x n m ) = 1 - <5 r 2 cos 2 v 
(ft x n m ) ■ (ft x n m ) = 1 - cos 2 u 
(f C X n m ) ■ (ft x n m ) = S r (1 - cos 2 v) . 


1 - 8} 


cos 2 V 


(n m x (n m x ft)) 
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Thus, we have the bound 
where 

At this stage, we have n s ■ n' m > e n and u - w > yu, We can visualize this situation 
by considering two cones in the Gaussian sphere, one centered about n m with radius e n , 
and one centered about w with radius p. We are essentially asking for bounds on the 
range of dot products between vectors lying within these two different cones. Assuming 
that the cones do not overlap, the maximum and minimum dot products will occur for 
the minimum and maximum angles between elements of the cones, respectively, and this 
clearly occurs for vectors lying in the cones and lying in the n' m - w plane. 

Suppose we denote: 

cos a = {n' m • w) 
cos <f> = e n 
cos ip = 8 r 
cos £ = // 

Clearly, the extremal angles for these two cones are given by 

a±[(f>+ £\. 

Thus, the range of possible values for 

n s • w 

is bounded by 

cos (a ± [<f>+ £]) . 




r< x n' m | = sin v 
|r c x n 3 '! = sinux 


Finally, 



Thus, by gathering all these expressions together, we obtain the following worst case 


expressions: 


tan## = 


-1 cos a 


sin v cos 0 

+ -1 cos (a- {(j> + Q) 

sin ui cos (p + [xp + 7 j j 

We note that this expression for the computed value of tan 9 has the expected 
limiting case behavior. In particular, if the error parameters cp = tp = 0, then the entire 
expression for tan 9 C reduces to that for tan##. 

We now seek an estimate for the error in the computed value of #, in the special 
case of <p and xp both small. In particular, we would like an expression for A 9 such that 

tan (## + A 6) « tan (9 C ) . 

In this way, we can place a bound on the possible error in the computation of 9. 

In the limiting case of (p and tp small, the bound p. ss 8 r so that £ ps xp. Furthermore, 
cos u) « cos v so that cos 7 ps cos cp cos ip p» cos (cp + xp) and hence, 7 « <p + xp. As a 
consequence, finding an approximation for the deviation in tan 9 reduces to comparing 
the worst case of deviation between 

cos (a - \<p+xP)) 
cos (0 + [<p + 2 xp}) 

and 

. „ , cos a , , 

tan (##) = ——. 10 

cos p 

By expansion, 

tan 9 t + tan A 9 

tan (6 t + A 9) = - — -- 

1 — tan 9t tan A# 

and if we substitute A 9 k, (p + xp, and use equation ( 10 ), then this expression can be 
expanded into 

cos (a — 6 - vp) + [cos 0 — sin a] sin (<p + xp) 
cos (0 + <p + xp) + [sin 0 - cos aj sin (i p + xp) ' 

Now, if cos 0 « sin a, then the second term in both the numerator and denominator 
can be ignored, especially since sin (0 -j- xp) is also small. Requiring this to be true is 
equivalent to requiring that 


tan ( 9 C ) 


tan ( 6 t + A 9) 


(s-w)" + (n^ • w) 2 ss 1 

that is, that the component of the unit vector w in the direction of n m x s be small. It 
is straightforward to show that 

(w • (n' m x s)) 2 = [cot u (n m • s) " 

Since we have already indicated that we will restrict our computation of the transfor¬ 
mation parameters to those cases in which r • n m << 1 it follows that cot v is small 
and the second terms in both the numerator and denominator in equation ( 11 ) can be 
disregarded. 

By dropping these terms, we see that the remaining expression reduces to 

, . cos (a - \(p + lil) 

tan (9 t + \<p + xp0 «- — ---- pr. . 

cos (0 + [<p + xp j) 

Thus, if xp is small enough, it follows that the worst case deviation is given by 
9 C w 9 t + (<p + xp) and hence that a good approximation to the error, A 9. in the computed 


tan (6 t + \<p + xp0 


88 


value of the rotation, 0, is given by 

AO k, \(f> + tp\. 


Errors in i?v 

We have computed expressions for the possible error in r and 0. In particular, we will 
denote the error in 0 by AO and the vector error in r by 8r such that r • 8i — 0. We now 
consider the problem of estimating bounds on the possible error in applying the computed 
rotation matrix to an arbitrary vector v. We know that the rotational component of the 
transformation of v is given by 

R (r, 0) v = cos Ov + (1 - cos 0) (r • v) r + sin 0 (r x v) 
where r and 0 are the parameters determining the rotation. 

We first consider the variation of this expression with respect to the angle of rotation. 
In particular, under the assumption that AO is small, the following holds: 

R (r, 0 + AO) v - R (r, 0) v = |v| {(cos (0 + A#) - cos 9) v 

+ (cos 0 ~ cos ( 0 + AO)) (r • v) r 
+ (sin (9 + AO) - sin 0) (r x v)} 

« |v| {-A# sin #v + AO sin 9 (r • v) r + A# cos 9 (r x v)} . 
Straightforward algebraic manipulation shows that the magnitude of this term is given 

by 

|v| AoJ :1 - (r • v) 2 
and this is bounded above by A 9 |v|. 

Next, we consider the variation with respect to r , so that 

R (r + <5r, 9) - R (r, 9) = |vj {(1 - cos#) [(? • v) 8r + (<5r • v) [r + 6r]j 

+ sin 9 [Sr x r:} . 

We consider the magnitude of the second term in the right hand side of this expression, 
by taking the dot product of this vector with itself. If we ignore terms in (<5r • 8 r), since 
the assumption of <5r small implies such terms are negligible, then the magnitude of the 
second term in equation (2) is given by 

|(l - cosf?) (<5r • v) + sin# (v * (r x 6r))|. (3) 

We now consider a bound for this expression. Suppose we let k denote the unit vector 
in the direction of 8r, and let = cosf. Since the worst case will occur wdien v 

lies entirely in the plane spanned by £r and r x 8 r, equation (3) reduces to 
|<5f | |(l - cos 9) cos f + sin 9 sin f | = |<5r j |cos f - cos (# + f )| • 

It is clear that the w'orst possible value for this expression is 2 |<5r|. Thus, the maximum 
value for the magnitude of the second term in equation (2) is 2 ;<5r; and overall, the 
maximum deviation due to a variation in r is given by 

2 |v| |5f| . 

Finally, we can piece together these two variations. By ignoring higher order terms, 
it is clear that a Taylor series expansion of R (r, 9) yields the following bound on errors 
in the computed value of a rotation: 

\R(r + 8r, 9 + AO) v - i?(f,#)v| < (2 \8r\ + |A#|) jv| . 
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Now, if the errors <p and ip are small, then we know 7 that 

| A9\ < \<p + ip\. 

Furthermore, 

|<5r| = Isin ip\ ss \ip\ 

and this implies a bound on variation in v of 

|3 ip + p\ Ivl . 


Moreover, if we are careful to restrict our computation appropriately, then ip- 
thus 


R (r + Sr, 6 + A 6)\ - R (r, 6)1 < \A<p\ |vj . 


<$>, and 


Errors in Vo 

We now consider bounds on the error associated with computing the translation compo¬ 
nent Vo of the transformation. Recall that the correct form for Vo is given by 
[n mj ,n mii n m>fc j v 0 = • p S)i - d x ) (n^y x n^ >/c ) 

+ {Kn,j • P s,j - d i) {n'm,k X *m,i) 

d (f^m,k ' P s,k ~ dk ) (p^m,i X ^m,j) 

where n m) ; is a face normal in model coordinates, n' m i is the transformed normal in 
sensor coordinates, p s i is the position vector of the contact point in sensor coordinates, 
and di is the constant offset for face i. We will consider error ranges for each of the 
components 

{Kn,k ■ P s,k ~ d k ) (h' mA x n' m<3 ) 

separately. 

We let s = n mj k • p s> /. - d k and v = n' md x n' m j so that the correct component is 
simply s v and the computed component is 

(s + A) (£v + ??u) 

where u is a unit vector orthogonal to v, and A,£ and rj are values to be determined. 
We assume that the measured position vector is given by p ( + £p;. where £p, is a vector 
of magnitude ej., and the measured normal is given by n Si , such that n $tl •n' rn>i > e n . 

First note that the magnitude of the error in computing the component of the 
translation is given by 

[sv - (s + A) (£v + r)u)\ = \/[s (1 - 0 - AF 2 (v • v) + r) 2 (s+ A) 2 . (12) 

Thus, we need to find bounds for s, A, (v • v) , £ and r\ 2 . We know that s is a given scalar 
value. If the angle between the face normals is given by n, n j = cosf, then 

v • v = 1 - cos 2 f = sin 2 f. 

It is straightforward to show that 

A = | (n s • (p + <5p) - d) - (n m • p - d) 

< e d + | (n s - n' m ) -p| 

< e d + ip 1 V2\/T~- e n . 

Next, we consider bounds for £,r? 2 , where 

x n tlj = £ (n ' mji x n' rej ) + 


( 13 ) 
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for some unit vector h orthogonal to (n^ x n ' m j)- Now 

x & s,j) ■ (n 'm,i x n'm,j) = (n 3 ,i' “m,t) (“m,i ■ n' mJ ) - (n Sil ■ n' mJ ) (n sJ ■ n' mii ) 

> e 2 - (n a ,i • n^j) (n s ,j • n' n ,i) ■ 

Moreover, the worst case (i.e. largest value) for n ■ n ' m j occurs at cos (f — cos -1 e n ) 
since this is the smallest angle possible between the cone of radius e n about n m i and 
the vector n' m j- (Note that we have assumed that the two cones do not overlap, i.e. 
f > 2 cos" 1 e n ). As before, we let e n = cos <t>. Then, by substitution and expansion, we 
get 

(» s,i x n 8l3 -) • (n' m> * x > sin f sin (f - 2 <f>) . 

At the same time, from equation (13) 

(*W x “«,i) ‘ (“m,t x 4,,) = £ sin 2 f 
so that we have the bound 

sin (f - 2 4 >) 


£> 


sin f 


Now the length of (rL^ x n S| y) is given by 

1 - (n Sii -n s j) 2 

and to get a bound on r] 2 , we want to maximize this expression. As before, the worst 
case occurs when the n s vectors lie at the limits of their respective cones, and 

(n s>i -Asj) = cos (^ T 2<f>) . 

We also have, however, from equation (13), 

{n a ,i x n s j) • (n s ,i x n s j) = £ 2 sin 2 ? + rj 2 

< 1 — cos 2 (f + 2 <j>). 

Substitution and expansion yield the following bound 

rj 2 < sin (4</>) sin (2f). 


We are now ready to bound the error in computing each component of the translation 
vector vq. From equation (12), the magnitude of the error is given by 

\/( s (1 - 0 - A £] 2 ( v • v ) + V 2 (s + A) 2 . 

Substitution of the various bounds yields 

y/[ssin f — (s + A)sin (f - 2 4 >)) 2 + (s + A)“ sin (2f)sin (4^) 

where 

5 = A St k ■ p S) i, — dk 

A < Q + |Ps,fc j \/2\/F“ e n 

COS f 

Note that as <f> i—> 0, this bound reduces to jAsinfj. Furthermore, as e,i <—► 0, this 
expressions tends to 0, so that the error in the computed translation vanishes as the error 
in the measurements do. 

Typically, we will want to restrict our computations to cases in which the faces 
are roughly orthogonal, so that f « j. In this case, the bound reduces to the simple 
expression 

\s — (s + A) cos (2<f >)\. 



